Dplyr

Quantitative Methodology (UPF)

Jordi Mas Elias

https://www.jordimas.cat/

Summary

  • Pipe
  • Row dplyr functions
  • Column dplyr functions
  • Transform variables

Warm up

R learning curve

RStudio workflow

Load packages.

library(dplyr)
library(ggplot2)
library(stringr)

Pipe

Pipe

One function

f(d$v)
f(d, v)
d |> f(v)
1:100 |> 
   sample(4)

Various functions

f1(f1(d, v1), v2)
d |> 
  f1(v1) |> 
  f2(v2)

Pipe

Without pipe

go_to_upf(take_transport(get_dressed(get_out_of_bed(wake_up(me,
time = "8:30"), side = "left"), pants = TRUE, shirt = TRUE), 
bus = TRUE, metro = FALSE), mqi = FALSE, bar = TRUE)

With pipe

me |> 
  wake_up(time = "8:30") |> 
  get_out_of_bed(side = "left") |> 
  get_dressed(pants = TRUE, shirt = TRUE) |> 
  take_transport(bus = TRUE, metro = FALSE) |> 
  go_to_upf(mqi = FALSE, bar = TRUE)

Example taken from here and here.

Row dplyr functions

Filter

Reduces the number of rows according to a certain criteria

df |> 
  filter(logic_vector)
  • Always is the result of a logic vector.
  • Can use & (AND), | (OR), ! (NOR).

Arrange

  • Arranges from less to more…
df |> 
  arrange(vector)
  • …or from more to less.
df |> 
  arrange(desc(vector))

Count

  • Counts the number of categories in a vector1
df |> 
  count(vector)
  • …and orders the results.
df |> 
  count(vector, sort = T)

Column dplyr functions

Select

  • Selects vectors.
df |> 
  select(vector1, vector4, vector6:vector9)
  • Removes vectors.
df |> 
  select(-vector2)

Rename

Renames the vector.

df |> 
  rename(new_name = Old.Vector.Name)

Mutate

  • Modifies the values of a vector1
df |> 
  mutate(vector5 = vector5 * 100)
  • … or creates a new vector2.
df |> 
  mutate(new_vector = vector5 + vector6)
  • Several operations can be combined.
df |> 
  mutate(vector5 = vector5 * 100,
         new_vector = vector5 + vector6)

Summarize

  • Summarizes data.
df |> 
  summarize(name = sum(vector))
  • Different elements can be summarized:
df |> 
  summarize(name1 = sum(vector),
            name2 = mean(vector),
            n = n())

*Mind the argument na.rm = T.

Group_by

  • Always combined with another function (e.g. summarize, filter, mutate), it groups the data by the values of a vector1
df |> 
  group_by(vector) |> 
  summarize(name1 = sum(vector))

*With group_by and summarize, we change the unit of observation of the dataset.

Recoding vectors

Recoding

When we recode variables (vectors), we lose information.

Destí Funció
Binària if_else()
Categòrica case_when()
Ordinal factor()
Qualsevol recode()
Altres as.numeric(), as.character(), as.Date(), etc.

Boolean operators

  • AND (&): TRUE if all conditions are met.
  • OR (|): TRUE if any condition is met.
  • NOT (!): TRUE if conditions are not met.

If_else

  • To a dichotomous / binary / dummy variable.
df |> 
  mutate(new_name = if_else(logic operation, true, false))

Case_when

case_when(logic operation ~ "C1"
          logic operation ~ "C2",
          logic operation ~ "C3",
          ...,
          TRUE ~ "CN")

Factor

df |> 
  mutate(new_vector = factor(wb$income_group, 
                             ordered = TRUE,
                             [levels o labels = ...]))

Recode

df |> 
  mutate(new_vector = recode(vector, 
                             old_value = "new_value"))

As functions

  • as.numeric(vector)
  • as.factor(vector)
  • as.character(vector)
  • as.integer(vector)
  • as.Date(vector)